101 research outputs found

    Predicting Social Links for New Users across Aligned Heterogeneous Social Networks

    Full text link
    Online social networks have gained great success in recent years and many of them involve multiple kinds of nodes and complex relationships. Among these relationships, social links among users are of great importance. Many existing link prediction methods focus on predicting social links that will appear in the future among all users based upon a snapshot of the social network. In real-world social networks, many new users are joining in the service every day. Predicting links for new users are more important. Different from conventional link prediction problems, link prediction for new users are more challenging due to the following reasons: (1) differences in information distributions between new users and the existing active users (i.e., old users); (2) lack of information from the new users in the network. We propose a link prediction method called SCAN-PS (Supervised Cross Aligned Networks link prediction with Personalized Sampling), to solve the link prediction problem for new users with information transferred from both the existing active users in the target network and other source networks through aligned accounts. We proposed a within-target-network personalized sampling method to process the existing active users' information in order to accommodate the differences in information distributions before the intra-network knowledge transfer. SCAN-PS can also exploit information in other source networks, where the user accounts are aligned with the target network. In this way, SCAN-PS could solve the cold start problem when information of these new users is total absent in the target network.Comment: 11 pages, 10 figures, 4 table

    Signed Distance-based Deep Memory Recommender

    Full text link
    Personalized recommendation algorithms learn a user's preference for an item by measuring a distance/similarity between them. However, some of the existing recommendation models (e.g., matrix factorization) assume a linear relationship between the user and item. This approach limits the capacity of recommender systems, since the interactions between users and items in real-world applications are much more complex than the linear relationship. To overcome this limitation, in this paper, we design and propose a deep learning framework called Signed Distance-based Deep Memory Recommender, which captures non-linear relationships between users and items explicitly and implicitly, and work well in both general recommendation task and shopping basket-based recommendation task. Through an extensive empirical study on six real-world datasets in the two recommendation tasks, our proposed approach achieved significant improvement over ten state-of-the-art recommendation models

    When and Where: Predicting Human Movements Based on Social Spatial-Temporal Events

    Full text link
    Predicting both the time and the location of human movements is valuable but challenging for a variety of applications. To address this problem, we propose an approach considering both the periodicity and the sociality of human movements. We first define a new concept, Social Spatial-Temporal Event (SSTE), to represent social interactions among people. For the time prediction, we characterise the temporal dynamics of SSTEs with an ARMA (AutoRegressive Moving Average) model. To dynamically capture the SSTE kinetics, we propose a Kalman Filter based learning algorithm to learn and incrementally update the ARMA model as a new observation becomes available. For the location prediction, we propose a ranking model where the periodicity and the sociality of human movements are simultaneously taken into consideration for improving the prediction accuracy. Extensive experiments conducted on real data sets validate our proposed approach

    Multilabel Consensus Classification

    Full text link
    In the era of big data, a large amount of noisy and incomplete data can be collected from multiple sources for prediction tasks. Combining multiple models or data sources helps to counteract the effects of low data quality and the bias of any single model or data source, and thus can improve the robustness and the performance of predictive models. Out of privacy, storage and bandwidth considerations, in certain circumstances one has to combine the predictions from multiple models or data sources to obtain the final predictions without accessing the raw data. Consensus-based prediction combination algorithms are effective for such situations. However, current research on prediction combination focuses on the single label setting, where an instance can have one and only one label. Nonetheless, data nowadays are usually multilabeled, such that more than one label have to be predicted at the same time. Direct applications of existing prediction combination methods to multilabel settings can lead to degenerated performance. In this paper, we address the challenges of combining predictions from multiple multilabel classifiers and propose two novel algorithms, MLCM-r (MultiLabel Consensus Maximization for ranking) and MLCM-a (MLCM for microAUC). These algorithms can capture label correlations that are common in multilabel classifications, and optimize corresponding performance metrics. Experimental results on popular multilabel classification tasks verify the theoretical analysis and effectiveness of the proposed methods

    Discovering Organizational Correlations from Twitter

    Full text link
    Organizational relationships are usually very complex in real life. It is difficult or impossible to directly measure such correlations among different organizations, because important information is usually not publicly available (e.g., the correlations of terrorist organizations). Nowadays, an increasing amount of organizational information can be posted online by individuals and spread instantly through Twitter. Such information can be crucial for detecting organizational correlations. In this paper, we study the problem of discovering correlations among organizations from Twitter. Mining organizational correlations is a very challenging task due to the following reasons: a) Data in Twitter occurs as large volumes of mixed information. The most relevant information about organizations is often buried. Thus, the organizational correlations can be scattered in multiple places, represented by different forms; b) Making use of information from Twitter collectively and judiciously is difficult because of the multiple representations of organizational correlations that are extracted. In order to address these issues, we propose multi-CG (multiple Correlation Graphs based model), an unsupervised framework that can learn a consensus of correlations among organizations based on multiple representations extracted from Twitter, which is more accurate and robust than correlations based on a single representation. Empirical study shows that the consensus graph extracted from Twitter can capture the organizational correlations effectively.Comment: 11 pages, 4 figure

    Mining Brain Networks using Multiple Side Views for Neurological Disorder Identification

    Full text link
    Mining discriminative subgraph patterns from graph data has attracted great interest in recent years. It has a wide variety of applications in disease diagnosis, neuroimaging, etc. Most research on subgraph mining focuses on the graph representation alone. However, in many real-world applications, the side information is available along with the graph data. For example, for neurological disorder identification, in addition to the brain networks derived from neuroimaging data, hundreds of clinical, immunologic, serologic and cognitive measures may also be documented for each subject. These measures compose multiple side views encoding a tremendous amount of supplemental information for diagnostic purposes, yet are often ignored. In this paper, we study the problem of discriminative subgraph selection using multiple side views and propose a novel solution to find an optimal set of subgraph features for graph classification by exploring a plurality of side views. We derive a feature evaluation criterion, named gSide, to estimate the usefulness of subgraph patterns based upon side views. Then we develop a branch-and-bound algorithm, called gMSV, to efficiently search for optimal subgraph features by integrating the subgraph mining process and the procedure of discriminative feature selection. Empirical studies on graph classification tasks for neurological disorders using brain networks demonstrate that subgraph patterns selected by the multi-side-view guided subgraph selection approach can effectively boost graph classification performances and are relevant to disease diagnosis.Comment: in Proceedings of IEEE International Conference on Data Mining (ICDM) 201
    • …
    corecore